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AN  ANALOGY  PERMITTING  MAXIMUM  LIKELIHOOD  ESTIMATION 
BY  A SIMPLE  MODIFICATION  OF  GENERAL  LEAST 
SQUARES  ALGORITHMS 

Louis  D.  Homer  and  R.  Clifton  Bailey 

Many  estimation  problems  require  nonlinear  least  squares 
estimation  or  other  iterative  maximum  likelihood  estimation. 

Whereas  many  general  programs  for  least  squares  regression  exist, 
general  programs  for  maximum  likelihood  estimation  are  less  common. 

General  discussions  of  the  maximum  likelihood  method  can  be 
found  in  numerous  papers.  A recent  text  by  Edwards  [5]  should 
serve  as  a good  introduction.  Several  algorithms  for  general  least 
squares  estimation  are  conveniently  summarized  by  Draper  and  Smith 
[4,  Ch.  10].  A more  extensive  treatment  of  algorithms  for  non- 
linear parameter  estimation  is  given  by  Bard  [2].  Our  technique 
for  modification  of  general  least  squares  algorithms  is  not  a 
weighted  least  squares  approach  as  presented  by  Bradley  [3]  and 
implemented  by  Jennrich  and  Moore  [6].  Since  our  approach  is  more 
direct,  we  anticipate  that  comparisons  will  favor  it  as  a basic 
computing  algorithm.  The  converted  computer  program  we  use  requires 
the  user  to  supply  a subroutine  with  the  log  likelihood  function. 
Weights  are  not  required.  No  programming  for  first  or  second 
partial  derivatives  need  be  added.  The  simplicity  of  conversion 
is  also  appealing. 

We  propose  to  summarize  the  major  steps  required  to  convert 
a general  least  squares  program  to  a general  maximum  likelihood 


program. 


The  typical  least  squares  problem  with  n observations,  a 
vector  x of  independent  variables,  and  k parameters  to  be  estimated 
consists  in  finding  parameters  0/  - (0j,  02.  •••»  9^)  to  minimize 


n - 2 

S(0)  - l (y,  - f (9,  • U) 

i-1 

The  usual  general  program  requires  the  user  to  specify  the  form 

of  f in  a subroutine  or  a few  lines  of  coding  and  to  choose  starting 

values  0 . The  more  versatile  programs  compute  numerical 
— o 

approximations  to  the  partial  derivatives  of  f with  respect  to  the 
parameters 


P - 
<nxk) 


3f 

30_' 


(2) 


and  also  compute 

4-P'(Z-f).  (3) 

(nxl) 

A new  estimate  0j  is  obtained  from  the  relation 

01  - 0^  + (P’P)'1  £.  (*) 

Next  02  may  be  obtained  from  0j,  and  so  on  until  successive 
changes  in  0 or  S are  deemed  suitably  small  and  the  0^  used  to 

A 

compute  the  smallest  S is  accepted  as  9_. 

The  iterative  maximum  likelihood  problem  consists  in  finding 

A 

£ - £ to  maximize 
n 

L-n  t. (0_,  Xj ) » <5> 

i-i 

where  L ^ is  the  likelihood  of  the  it^1  observation  and  L is  the 
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likelihood  function.  Again  we  have  n observations  comprising  x 
variables  for  each  observation  and  k parameters  to  estimate.  We 
assume  the  existence  of  the  regularity  conditions,  like  Kendall 
and  Stuart  [7].  Maximizing  log  L is  done  by  obtaining  successive 
0_'s  by 

32  log  L ( * 3 log  L 


6=9 
— r — r 

(kxl) 


f 32  log  L )~1 

-1  ' \ 36i  39j  J 

(kxk) 


3 0 
(kxl) 


, th 


where  0^  is  the  i element  of  the  vector  of  parameters  0. 
Alternatively,  one  might  invert  the  expected  value  of  the  (k  x k) 
matrix  above.  Neither  direct  use  of  the  matrix  nor  its  expected 
value  lends  itself  conveniently  to  the  construction  of  general 
programs  requiring  the  user  to  supply  only  the  form  of  or  log 
l . Additional  specifications  for  first  and  second  partial 
derivatives  of  log  L or  the  expectations  would  have  to  be  made. 

If,  however,  we  choose 


( 8 P li\P  i»e  ti\V1  5 
' ^-i  + ( th  \ 9 i A 3 i /)  Jx 


3 log  ^i 
Te  ’ 


(7) 


then  letting  u be  a vector  of  n ones  and  letting 
3 log  l 

= (3  log  IJ  30^} 


nxk 


3 0' 


and 


£ = P'u,  = (3  log  L/30^ } , 


(8) 


(9) 


we  may  obtain 


3 


0 


0 


-1 


f-1  + (P'P)  £ 


(10) 


a2  log  l 


30t  30J 


- P'P  + R 


(14) 


This  is  the  same  form  as  (4),  hence  it  is  possible  to  write  general 
maY<nnim  likelihood  programs  which  are  as  versatile  and  easy  to  use 
as  the  general  nonlinear  least  squares  programs.  Conversion  of 
nonlinear  least  squares  programs  for  use  in  maximum  likelihood 
estimation  is  easy. 

If  the  regularity  conditions  mentioned  before  hold,  we  have 


E(P’P), 


(11) 


so  that  (P’P)-1  is  a suitable  estimate  of  the  covariance  matrix 
of  the  estimated  parameters.  In  fact.  It  is  a generalized 
compromise  between  using  the  inverse  of  3q  — and  the  inverse  of 
its  expectation. 

Recall  that 


32  log  L 
39i  30j 


and 


n 

I 

m*l 


* 3 l 3 l , 

1 m m 1 

(2  30  Tm 

m 


32  l 


m 


30t  30^ 


(12) 


3 log  U ’ / 3 log  L\  m £ 
,30.  I » 30  1 


'J 


m»l 


3 l 3 l 


. „ _ . , n . 3 *3  t 

1 m m , v i ® Q. 

? "i  "j  ji  ,ei  ,ei 

. m hjhn 


.(13) 


To  facilitate  comparisons  we  rewrite  12  and  13: 
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The  expectation  of  R is  zero.  This  may  be  seen  by  noting  that 
the  expectation  of  T is  zero  and  that  the  expectation  of  the  left- 
hand  side  of  14  is  the  negative  of  the  expectation  of  the  left-hand 
side  of  15.  Since  we  drop  R,  our  general  approach  uses  a partial 
expectation,  dropping  terms  known  generally  to  have  expectation 
zero.  In  the  particular  case  of  a multinomial,  - P'P  is  the 
matrix  of  second  partial  derivatives,  and  consequently  the 
algorithm  suggested  is  exactly  equivalent  to  the  algorithm  obtained 
by  using  the  expectation  of  the  second  partial  derivatives.  In 
the' case  of  estimating  the  variance  and  mean  of  a normal 
distribution,  the  corresponding  P'P  includes  off-diagonal  elements 
whose  expectation  is  zero  but  whose  numerical  value  will  only  be 
zero  if  the  third  sample  moment  about  the  mean  happens  to  be  zero. 

In  this  case  the  suggested  algorithm  differs  from  that  using  the 
expectation  of  the  second  partial  derivatives  because  of  these 
possibly  nonzero  off-diagonal  elements.  We  may  regard  the  operation 
of  taking  the  expectation  of  the  second  partial  derivatives  of  log 
L as  having  a general  answer  which  we  take  advantage  of  and  a 
particular  part  which  we  ignore  to  preserve  the  generality  and 
simplicity  of  our  numerical  approach.  In  some  instances  where  more 
exact  results  are  desired,  E(P'P)  may  be  easier  to  compute  than 


We  emphasize  that  this  compromise  in  the  choice  of  an -estimator 
of  the  covariance  matrix  means  that  one  can  readily  convert  least 
squares  programs  to  maximum  likelihood  estimation,  and  these 
programs  need  not  require  the  user  to  supply  specific  programming 
for  the  partial  derivatives. 

Other  changes  we  suggest  are  in  the  accumulation  of  - log  L 
instead  of  S(£).  We  suggest  - log  L Instead  of  log  L since  this 
makes  it  a minimization  problem  and  many  least  squares  programs 
check  to  make  sure  that  S(0j+^)  < S(6^)  before  accepting  the  new 
set  of  parameters.  Computation  of  the  covariance  matrix  in  the 
least  squares  case  is 

V = o2  (P'P)"1.  (16) 

For  the  maximum  likelihood  case  the  scalar  multiplication  must  be 
removed,  that  is, 

V « (P’P)'1.  (17) 

Other  details  involving  input  and  output  would  surely  have 

to  be  revised  but  should  not  pose  difficult  problems.  Since  these 
are  apt  to  be  specific  to  the  particular  program  being  converted, 
an  extensive  discussion  here  seems  pointless. 

A detailed  example  of  conversion  is  available  for  a 138-line 
BASIC  program  [1]  originally  designed  for  nonlinear  least  squares 
regression  on  an  interactive  time-sharing  system  using  Marquardt's 
algorithm.  The  program  requires  modification  of  7 lines  for 
conversion  to  a general  maximum  likelihood  program.  The  converted 
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program  requires  the  user  to  supply  only  the  programming  needed  to 
calculate  the  log  likelihood  for  an  individual  observation.  This 
will  usually  be  just  a few  lines  of  code.  The  output  includes 
parameter  estimates,  the  log  likelihood,  and  the  covariance  matrix. 
Techniques  for  using  the  program  output  to  obtain  confidence 
regions  about  functions  of  the  parameters  by  means  of  propagation 
of  error  formulas  are  also  illustrated. 
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